The symbol <- assign value to a symbol.
x <- 10
print(x)
## [1] 10
txt <- "Hello World!"
txt
## [1] "Hello World!"
The # character indicates a comment. Anything to the right of the # (including the # itself) is ignored. This is the only comment character in R.
Exercise 1: What happen if you run the following code in your R script?
x <- # This expression is incomplete
x <- 101 # nothing printed
x # auto-printing occurs
## [1] 101
print(x) # explicit printing
## [1] 101
The [1] in the output indicates that x is a vector and 101 is its first element.
y <- 3:17
y
## [1] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
The numbers in the [] are not part of the vector itself.
The operator : is used to create a interger sequence.
Exercise 2: In your Exercise R script, create a sequence of integer from 2000 to 2010.
R has 5 basic atomic classes
To explicity declare an integer, you need specify the L suffix.
A special number is Inf which represents infinity. You can use Inf in ordinary calculations; e.g. 1 / Inf is 0.
R also has many data structures. These include
It is the most common and basic data structure. Empty vector: You can create a vector pre-defining the length and/or type.
x0 <- vector() # empty vector
x1 <- vector(length = 10) # with a pre-defined length
## with a length and type
x2 <- vector("character", length = 10)
x3 <- vector("numeric", length = 10)
x4 <- vector("integer", length = 10)
x5 <- vector("logical", length = 10)
You can examine your vector using
typeof(x)
length(x)
class(x)
str(x)
Exercise 3: In your Exercise R script, print and examine the vectors x0, x1, x2, x3, x4, and x5.
You can also create vectors using the c() function.
num_vec <- c(1,2,3,4,5) # numeric vector
num_vec
## [1] 1 2 3 4 5
To explicitly create a vector of integers:
int_vec <- c(1L,2L,3L,4L,5L) # vector of integers
int_vec
## [1] 1 2 3 4 5
To create a logical vector:
logical_vec <- c(TRUE, TRUE, FALSE) # logical vector
logical_vec
## [1] TRUE TRUE FALSE
To create character vector:
char_vec <- c("apple", "pear", "banana","grape") # character vector
char_vec
## [1] "apple" "pear" "banana" "grape"
You can also add elements to your vector.
int_vec <- c(int_vec,6L)
int_vec
## [1] 1 2 3 4 5 6
Exercise 4: In your Exercise R script, add two more fruits to the char_vec.
Attention Vectors have only one type, if you add more than on type to your vector, R will create a vector that is the least common denominator following the coersion rulelogical < integer < numeric < complex < character.
Exercise 5: In your Exercise R script, without running the code, guess what is the vector result.
x <- c(10.1, "b")
x <- c(TRUE, 33)
x <- c("city", TRUE)
Objects can be explicitly coerced using as.<class_name>
x <- 1:5
as.numeric(x)
## [1] 1 2 3 4 5
as.logical(x)
## [1] TRUE TRUE TRUE TRUE TRUE
as.character(x)
## [1] "1" "2" "3" "4" "5"
as.complex(x)
## [1] 1+0i 2+0i 3+0i 4+0i 5+0i
Matrices are vectors with a dimension attribute.
m <- matrix(nrow = 2, ncol = 3)
m
## [,1] [,2] [,3]
## [1,] NA NA NA
## [2,] NA NA NA
dim(m)
## [1] 2 3
attributes(m)
## $dim
## [1] 2 3
Matrices are constructed column-wise, so entries can be thought of starting in the “upper left” corner and running down the columns.
m <- matrix(1:6, nrow = 2, ncol = 3)
m
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
Matrices can also be created directly from vectors by adding a dimension attribute.
m <- 1:10
m
## [1] 1 2 3 4 5 6 7 8 9 10
dim(m) <- c(2, 5)
m
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 3 5 7 9
## [2,] 2 4 6 8 10
Matrices can be created by column-binding or row-binding with the cbind() and rbind() functions.
x <- 1:3
y <- 10:12
z_column <- cbind(x, y)
z_row <- rbind(x, y)
List is a special type of vector that can contain elements of different classes.
Lists can be explicitly created using the list() or coerce other objects using as.list() function.
x <- list(3:7, "a", TRUE, 1+4i)
x
## [[1]]
## [1] 3 4 5 6 7
##
## [[2]]
## [1] "a"
##
## [[3]]
## [1] TRUE
##
## [[4]]
## [1] 1+4i
x <- 3:10
x <- as.list(x)
length(x)
## [1] 8
Exercise 6: In your Exercise R script, run the code above. What is the class of x[1]? how about x[[1]]?
Factors are used to represent categorical data and can be unordered or ordered.
Factor objects can be created with the factor() function.
x <- factor(c("yes", "yes", "no", "yes", "no"))
x
## [1] yes yes no yes no
## Levels: no yes
table(x)
## x
## no yes
## 2 3
x
## [1] yes yes no yes no
## Levels: no yes
unclass(x)
## [1] 2 2 1 2 1
## attr(,"levels")
## [1] "no" "yes"
attr(x,"levels")
## [1] "no" "yes"
The order of the levels of a factor can be set using the levels argument to factor().
x <- factor(c("yes", "yes", "no", "yes", "no"))
x ## Levels are put in alphabetical order
## [1] yes yes no yes no
## Levels: no yes
x <- factor(c("yes", "yes", "no", "yes", "no"),
levels = c("yes", "no"))
x
## [1] yes yes no yes no
## Levels: yes no
Exercise 7: In your Exercise R script, order the fruits names bellow by their sizes.
"Apple","Blueberry","Grape","Grapefruit","Plum","Watermelon"
Data frames are used to store tabular data in R. Hadley Wickham’s package dplyr has an optimized set of functions designed to work efficiently with data frames.
Caracteristics of a data frame:
row.namesA data frame can be created by reading in a dataset using the read.table() or read.csv(), be created explicitly with the data.frame() function, or they can be coerced from other types of objects like lists.
Data frames can be converted to a matrix by calling data.matrix().
df <- data.frame(id = letters[1:10], x = 1:10, y = rnorm(10))
df
## id x y
## 1 a 1 -0.2999690
## 2 b 2 -0.3192881
## 3 c 3 0.6519987
## 4 d 4 0.4285505
## 5 e 5 -0.6914915
## 6 f 6 -0.1796887
## 7 g 7 -0.4609372
## 8 h 8 -0.2804702
## 9 i 9 1.1547752
## 10 j 10 0.8251494
nrow(df)
## [1] 10
ncol(df)
## [1] 3
names(df)
## [1] "id" "x" "y"
names(df ) <- c("Identity","Position","Value")
names(df)
## [1] "Identity" "Position" "Value"
row.names(df) # show the row names
## [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
row.names(df) <- c("a1","a2","a3","a4","a5","a6","a7","a8","a9","a10") # assign row names to data frame
row.names(df) # show the row names
## [1] "a1" "a2" "a3" "a4" "a5" "a6" "a7" "a8" "a9" "a10"
Missing values are denoted by NA or NaN for undefined mathematical operations.
is.na() is used to test objects if they are NA
is.nan() is used to test for NaN
NA values have a class also, so there are integer NA, character NA, etc.
NaN value is also NA but the converse is not true## Create a vector with NAs in it
x <- c(1, 2, NA, 10, 3)
## Return a logical vector indicating which elements are NA
is.na(x)
## [1] FALSE FALSE TRUE FALSE FALSE
## Return a logical vector indicating which elements are NaN
is.nan(x)
## [1] FALSE FALSE FALSE FALSE FALSE
## Now create a vector with both NA and NaN values
x <- c(1, 2, NaN, NA, 4)
is.na(x)
## [1] FALSE FALSE TRUE TRUE FALSE
is.nan(x)
## [1] FALSE FALSE TRUE FALSE FALSE
Exercise 8: In your Exercise R script: 1. Create the following data frame df.
| Var1 | x | y | z | k |
|---|---|---|---|---|
| a | 1.20 | Apple | yes | 14.07 |
| b | 1.00 | Apple | yes | 14.01 |
| c | 3.30 | Orange | no | 15.57 |
| d | 0.84 | Banana | no | 15.58 |
| e | 5.00 | Coconut | no | 15.16 |
Change the variable names (Var1, x, y, z, k) to (ID,Cost_unit,Fruit,Color_red,Price)
Order the Fruit levels as (Banana < Apple < Orange < Coconut) and Color_red levels as (no < yes)